Query Join Processing Over Uncertain Data for Decision Tree Classifiers

نویسندگان

  • G. Kalyani
  • V. Yaswanth Kumar
چکیده

Traditional decision tree classifiers work with the data whose values are known and precise. We can also extend those classifiers to handle data with uncertain information. Value uncertainty arises in many applications during the data collection process. Example sources of uncertainty measurement/quantization errors, data staleness, and multiple repeated measurements. Rather than abstracting uncertain data by statistical derivatives, such as mean and median, the accuracy of a decision tree classifier can be improved much if the complete information of a data item is used by utilizing the Probability Density Function (PDF). In particular, an attribute value can be modelled as a range of possible values, associated with a PDF. The PDF function has only addressed simple queries such as range and nearestneighbour queries. Queries that join multiple relations have not been addressed with PDF. Despite the significance of joins in databases, we address join queries over uncertain data. We propose semantics for the join operation, define probabilistic operators over uncertain data, and propose join algorithms that provide efficient execution of probabilistic joins especially threshold. In which we avoid the semantic complexities that deals with uncertain data. For this class of joins we develop three sets of optimization techniques: item-level, page-level, and index-level pruning. We will compare the performance of these techniques experimentally.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extending Decision Tree Clasifiers for Uncertain Data

Traditionally, decision tree classifiers work with data whose values are known and precise. We extend such classifiers to handle data with uncertain information. Value uncertainty arises in many applications during the data collection process. Example sources of uncertainty include measurement/quantization errors, data staleness, and multiple repeated measurements. With uncertainty, the value o...

متن کامل

Optimizing Probabilistic Query Processing on Continuous Uncertain Data

Uncertain data management is becoming increasingly important in many applications, in particular, in scientific databases and data stream systems. Uncertain data in these new environments is naturally modeled by continuous random variables. An important class of queries uses complex selection and join predicates and requires query answers to be returned if their existence probabilities pass a t...

متن کامل

Chapter 10 INDEXING UNCERTAIN DATA

As the volume of uncertain data increases, the cost of evaluating queries over this data will also increase. In order to scale uncertain databases to large data volumes, efficient query processing methods are needed. One of the key techniques for efficient query evaluation is indexing. Due to the nature of uncertain data and queries over this data, existing indexing solutions for precise data a...

متن کامل

Fast Reachability Query Processing

Graph has great expressive power to describe the complex relationships among data objects, and there are large graph datasets available. In this paper, we focus ourselves on processing a primitive graph query. We call it reachability query. The reachability query, denoted A D, is to find all elements of a type D that are reachable from some elements in another type A. The problem is challenging...

متن کامل

Scalable Statistical Modeling and Query Processing over Large Scale Uncertain Databases

Title of Dissertation: SCALABLE STATISTICAL MODELING AND QUERY PROCESSING OVER LARGE SCALE UNCERTAIN DATABASES Bhargav Kanagal Shamanna Doctor of Philosophy, 2011 Dissertation directed by: Dr. Amol Deshpande Dept. of Computer Science The past decade has witnessed a large number of novel applications that generate imprecise, uncertain and incomplete data. Examples include monitoring infrastructu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012